Predicting perinatal health outcomes using smartphone-based digital phenotyping and machine learning in a prospective Swedish cohort (Mom2B): study protocol

Introduction Perinatal complications, such as perinatal depression and preterm birth, are major causes of morbidity and mortality for the mother and the child. Prediction of high risk can allow for early delivery of existing interventions for prevention. This ongoing study aims to use digital phenotyping data from the Mom2B smartphone application to develop models to predict women at high risk for mental and somatic complications. Methods and analysis All Swedish-speaking women over 18 years, who are either pregnant or within 3 months postpartum are eligible to participate by downloading the Mom2B smartphone app. We aim to recruit at least 5000 participants with completed outcome measures. Throughout the pregnancy and within the first year postpartum, both active and passive data are collected via the app in an effort to establish a participant’s digital phenotype. Active data collection consists of surveys related to participant background information, mental and physical health, lifestyle, and social circumstances, as well as voice recordings. Participants’ general smartphone activity, geographical movement patterns, social media activity and cognitive patterns can be estimated through passive data collection from smartphone sensors and activity logs. The outcomes will be measured using surveys, such as the Edinburgh Postnatal Depression Scale, and through linkage to national registers, from where information on registered clinical diagnoses and received care, including prescribed medication, can be obtained. Advanced machine learning and deep learning techniques will be applied to these multimodal data in order to develop accurate algorithms for the prediction of perinatal depression and preterm birth. In this way, earlier intervention may be possible. Ethics and dissemination Ethical approval has been obtained from the Swedish Ethical Review Authority (dnr: 2019/01170, with amendments), and the project fully fulfils the General Data Protection Regulation (GDPR) requirements. All participants provide consent to participate and can withdraw their participation at any time. Results from this project will be disseminated in international peer-reviewed journals and presented in relevant conferences.


Overview
This protocol paper describes an innovative and impactful project, the Mom2B Study, that aims to use active and passive data collection in order to predict perinatal depression and preterm birth among a large-scale national sample of Swedish-speaking women. Given their use of digital phenotyping and "big data" methods, the authors plan to employ machine learning and deep learning analytic techniques in order to develop predictive algorithms for perinatal depression and preterm birth. The questions, comments, and suggested revisions below are intended to strengthen this protocol paper for possible publication.
Data Collection 1. Tables 2 and 3 offer helpful visuals for understanding which surveys will be administered and when. However, it is unclear how survey administration decisions were made. For example, why is the EPDS administered multiple times during postpartum weeks 0 -27 but only once during weeks 28 -40 and once during weeks 40 -52? Recent research indicates that postpartum women in the United States may be more likely to die by suicide and/or drug overdose between postpartum months 9 -12 (e.g., Mangla et al., 2019), so it may be worth considering more frequent EPDS data collection during the later postpartum time points. 2. It would be helpful to know more details about the surveys listed in Table 3. For instance, what surveys will be administered to measure sleep and breastfeeding? Are these measures validated or self-developed? Without this information, it seems as if this study cannot easily be repeated. 3. Given that the authors plan to collect voice recording data, have they considered collecting recordings of infant cry vocalizations? There has been some interesting machine learning-based research on infant cry vocalizations and potential associations with postpartum depression (e.g., Gabrieli Figure 2 is challenging to understand. If Figure 2 will not be placed next to this in-text section upon publication, the authors are encouraged to move this information to the Figure 2

REVIEWER
Heaukulani, Creighton Ministry of Health, Office for Healthcare Transformation REVIEW RETURNED 28-Jan-2022

GENERAL COMMENTS
I'm glad to see this study. The application of digital phenotyping to this population is likely to be highly impactful, and its investigation here is very timely. What's more, the size of the cohort and in particular the development of the Mom2B App covering both iOS and Android, and its distribution to so many phones, are impressive. The authors should be proud of these accomplishments.
I request that the data analysis section be expanded to include specific plans for linear modelling and extraction of corresponding inferences. I understand that the authors are leaving much of the deep learning modelling to exploration, and this is great! But you should also have plans for linear models (hierarchical/multilevel/random effects models). In my opinion, these should always be carried out and studied before playing around with deep learning models anyway, which are usually only useful for prediction. Showing inferences from the linear models will standardize the results across the literature and will most likely produce very interesting insights. I do not think conducting this analysis would be hard or overly time consuming, and I believe it will help produce insights to guide how you eventually construct your deep learning architectures or approach feature engineering. Perhaps the authors intended to do this when they said that "traditional ML" methods will be explored, but it should be made explicit. In the spirit of pre-declaration, the protocol should include very specific data analysis plans including a specific model and the way significance of inferences will be determined, how multiple comparisons are avoided, etc. Currently this does not exist in the protocol.

VERSION 1 -AUTHOR RESPONSE
Comments from Reviewer 1 Data collection 1. Tables 2 and 3 offer helpful visuals for understanding which surveys will be administered and when. However, it is unclear how survey administration decisions were made. For example, why is the EPDS administered multiple times during postpartum weeks 0 -27 but only once during weeks 28 -40 and once during weeks 40 -52? Recent research indicates that postpartum women in the United States may be more likely to die by suicide and/or drug overdose between postpartum months 9 -12 (e.g., Mangla et al., 2019), so it may be worth considering more frequent EPDS data collection during the later postpartum time points.  Thank you for this suggestion. Based on your comments, it is apparent that our tables lacked the clarity we were hoping for. We have replaced the Tables 2 and 3 with newly added Figures 2 and 3 that illustrate a more precise timeline of when the surveys are delivered, for how long they remain available for completion, and the number of times in total that they occur throughout the study period. We hope this addresses the matter satisfactorily.  Regarding the importance of assessing depression in late pregnancy, you raise an important point. However, the EPDS was developed as a tool to screen for depressive symptoms during early postpartum period, ideally in weeks 6-12 after birth. This is also the period when onset of symptoms is most common, which of course could persist and lead to severe outcomes such as suicide later in the postpartum period. It has also been shown to be a valid measure of depression during pregnancy (Levis et al., 2020). We absolutely do agree with the need to continue screening in the late postnatal period, and while we deliver EPDS only twice, we also continue to monitor emotional wellbeing through the DSM5-short questionnaires and the WHO5 Wellbeing Index, the latter being delivered biweekly. Moreover, information will also be acquired from patient registers regarding diagnoses, psychiatric hospitalizations, and prescription of psychiatric medications, which will supplement our outcome measures even in the late postpartum period.
1. It would be helpful to know more details about the surveys listed in Table 3. For instance, what surveys will be administered to measure sleep and breastfeeding? Are these measures validated or self-developed? Without this information, it seems as if this study cannot easily be repeated.  Thank you for pointing this out. We agree that the tables may not have been sufficiently clear in describing the surveys used and the timeline. Moreover, we noticed that certain validated surveys had incorrectly been mentioned as self-developed questionnaires, and were missing from Table 2 and 3, which was an oversight on our part. As mentioned above, we have now replaced Tables 2 and  3 with Figures 2 and 3, respectively, that distinguish between validated and self-developed surveys. We have taken care to comprehensively list all self-developed surveys we use in terms of the construct/subject they assess. We hope the changes resolve the concerns for replicability and clarify the factors taken into account in our analysis. .  Thank you for the suggestion. It would certainly be interesting to explore this as a future research topic, perhaps with a subset of women from the Mom2B cohort. However, in the case of the current study, there are some practical reasons why this may be too ambitious to accomplish in the ongoing cohort. Collecting data from infants has practical challenges (Ji et al., 2021); would require additional consent from the co-parent; and, by placing a greater demand on new mothers, we also risk a greater opt-out rate. However, given the ongoing status of the Mom2B cohort, we thank the Reviewer for bringing it to our attention, and we remain open to such a study after we assess its feasibility.
1. Are passive data being collected continuously from pregnancy through 52 weeks postpartum? Clarification on this design consideration would be helpful.  We have accordingly revised the sub-section 'Passive data collection' to clarify the nature of passive data collection with the amended statement: "Passive data that the user has provided consent for are continuously collected via the Mom2B app throughout the study period, and are used to infer the user's behavioral patterns". We have also added a brief sentence under the sub-heading "Data Collection" to clarify the periods and conditions under which data is collected, as follows: "Data can be collected from the first week of pregnancy, and up till week 52 after birth. Only data that participants have consented for is collected from the time they register to study, and they can change their consent preferences anytime in the app if they wish to stop". We hope this resolves the lack of clarity.
1. Given that the authors are planning to collect data on women's experiences with previous miscarriages, abortions, and feelings about childbirth, they may consider assessing if the present pregnancy was unplanned and/or unwanted, as unplanned pregnancy is a risk factor for postpartum depression (e.g., Faisal-Cury et al., 2017; Yanikkerem et al., 2012).  We agree that this is an important factor to consider. While we had been assessing pregnancy planning previously using a single, direct question, which had been missing in the previous tables listing the surveys, we have since incorporated the London Measure of Unplanned Pregnancy (Barrett et al., 2004), a validated survey to assess how planned the pregnancy was. The survey is now included in Figure 2 illustrating validated surveys. Strengths and Limitations 1. It would be helpful if the authors provided more details about "the involvement of participants" in their study (p. 11). For instance, what mental health information is included in the reports that are sent back to participants? How often do participants receive these reports? Are the participants made aware that these reports are based on research and not professional clinical assessment? If participants contact the research team for mental health support, what protocol do the authors follow? What resources are the authors using to inform these ethical decisions?  We agree that information regarding the reports sent back to patients should have been elaborated. We have revised the 'Strengths and limitations' section to clarify exactly what information is sent back to the users, that is, statistics and weekly informational reports. The text now reads: "Statistics based on WHO-5 and behavioral data (movement, internet usage, sleep etc.) collected from participants are sent to the user, allowing them to follow their wellbeing and activity as an incentive for continued participation. Weekly informational reports regarding common experiences and concerns for both the mother and child for that particular week of the perinatal period, based on information taken from 1177.se (Swedish healthcare service), are available to users and allow them to easily stay informed". We hope it is clear within the text now that the reports are sent on a weekly basis according to the perinatal week women are in.  We have added the phrase "based on information taken from 1177.se (Swedish healthcare service)" to clarify where the information in the informational reports is taken from, and this is information is clearly stated at the end of each report sent to participants.  As for the lack of explanation of the protocol for women who score highly on the EPDS, we thank you for pointing that out. We have, accordingly, added this information, as well as the explanation that our protocol is per standard guidelines, citation included. The text now reads: "As per standard guidelines[103], If participants receive a high score on the EPDS, they are prompted to contact their healthcare provider or emergency support services for support, and if unsure, they can contact the research team, which will help them find appropriate support for their needs. Continuous contact is maintained with participants until they find support".
1. Relatedly, I wonder if it is possible that regular reports about participants' mental health and digital phenotyping data may influence ongoing data that they provide. Have the authors considered this possibility? If so, how do they plan to account for these effects?  We thank the Reviewer for raising this important point. There is, of course, a possibility that being frequently informed about their own mood state, as well as the pregnancy or postnatal period, and seeing a statistical summary of their behavioral data may act as an intervention of sorts and influence their responses on certain surveys. We believe nevertheless that this is an important ethical obligation from our side, as well as an incentive for continued participation, to continue to provide support and easy access to information regarding the perinatal period in general as well as their personal mood and activity. However, the reviewer's comment has made us consider solutions for accounting for these effects. We are looking into acquiring app metadata regarding how often the user checked the weekly reports and statistics, and incorporating that information as a variable within our models. It is important to note that this model is generalizable in digital contexts, considering that even other variables, such as passive data, are only possible to acquire using some kind of digital device. This is also now included among the limitations of our study. Minor Comments 1. The authors are encouraged to review their manuscript for occasional grammatical errors (e.g., "the app could be further developed to include evidence-based interventions interventions" [p. 12]).  Thank you for bringing this to our attention. We have carefully reviewed the draft again to identify any grammatical or spelling errors.
1. Without the in-text information under Data Flow and Storage, Figure 2 is challenging to understand. If Figure 2 will not be placed next to this in-text section upon publication, the authors are encouraged to move this information to the Figure 2 caption.
 We completely agree. However, all figures, including Figure 2 (now labelled Figure 4) must be submitted as separate files to BMJ open, and are therefore not placed in the document submitted to BMJ. We assume they will be placed in proximity to the in-text section titled Data Flow and Storage upon publication.
Comments from Reviewer 2 I request that the data analysis section be expanded to include specific plans for linear modelling and extraction of corresponding inferences. I understand that the authors are leaving much of the deep learning modelling to exploration, and this is great! But you should also have plans for linear models (hierarchical/multi-level/random effects models). In my opinion, these should always be carried out and studied before playing around with deep learning models anyway, which are usually only useful for prediction. Showing inferences from the linear models will standardize the results across the literature and will most likely produce very interesting insights. I do not think conducting this analysis would be hard or overly time consuming, and I believe it will help produce insights to guide how you eventually construct your deep learning architectures or approach feature engineering. Perhaps the authors intended to do this when they said that "traditional ML" methods will be explored, but it should be made explicit. In the spirit of pre-declaration, the protocol should include very specific data analysis plans including a specific model and the way significance of inferences will be determined, how multiple comparisons are avoided, etc. Currently this does not exist in the protocol.  We thank the Reviewer for these thoughtful suggestions. We would like to just clarify that the primary aim of the current project is to develop prediction models (not inferential models). For this, we do plan to use, for e.g., logistic regression (among other machine learning methods), but not for the purpose of hypothesis testing and the corresponding extraction of inferences. The multilevel models the reviewer is suggesting are not part of our plan for the time being, as we instead plan to use traditional feature engineering and selection techniques, as well as DL techniques to construct our feature set.  We agree with the recommendation to elaborate our analysis plan. We revised the sub-section titled 'Preliminary data analysis strategy' to incorporate more explicit details regarding our analysis plan, including further details about how we will approach feature extraction, what models we plan to use, and evaluation criteria for our predictive models. We agree that it is important to highlight the variables of importance for implementation in healthcare settings (explainable AI) because healthcare professionals may not be prone to trusting and using the algorithms if they don't understand them (black box problem), however, the importance of the variables will be based on their predictive values, not causal effects.  Beyond our primary aim with this data, it is of course true, as the Reviewer suggested, that the Mom2B project is going to generate a big and detailed dataset. Part of the purpose with this Study Protocol is to make our dataset known so that collaborations can be established and further substudies planned. Although we would optimally pre-specify all such plans, at this point we don't have an overview of the sub-studies that the project may inspire. However, we fully agree with the importance of pre-specification of analysis plans (especially for confirmatory hypothesis testing). Given that the sub-studies will involve secondary use of already collected data, the preregistration would need to make it clear what access the authors have had to the data prior to writing the protocol. As such, full study protocols and analysis plans of these and other future sub-studies will optimally be registered prospectively to analysis, together with disclosure of prior knowledge about the study data. For exploratory and confirmatory hypothesis tests alike, we will make sure to report all the analyses made, to avoid a situation with selective reporting among multiple analyses. In sum, even if a Study Protocol such as this does not fulfill the same purpose as a preregistered protocol, we hope that it contributes to transparency by declaring what data will be available for secondary use.
Changes made in document